Search CORE

28 research outputs found

Object-Oriented Dynamics Learning through Multi-Level Abstraction

Author: Lin Zichuan
Ren Zhizhou
Wang Jianhao
Zhang Chongjie
Zhu Guangxiang
Publication venue
Publication date: 05/12/2019
Field of study

Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial Intelligence (AAAI), 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Author: Liang Yitao
Liu Anji
Ma Jianzhu
Peng Jian
Ren Zhizhou
Publication venue
Publication date: 19/11/2022
Field of study

Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.Comment: Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022

arXiv.org e-Print Archive

Self-Organized Polynomial-Time Coordination Graphs

Author: Dong Weijun
Ren Zhizhou
Wang Jianhao
Wang Tonghan
Yang Qianlan
Zhang Chongjie
Publication venue
Publication date: 16/09/2022
Field of study

Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It conducts a graph-based value factorization and induces explicit coordination among agents to complete complicated tasks. However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. SOP-CG employs dynamic graph topology to ensure sufficient value function expressiveness. The graph selection is unified into an end-to-end learning paradigm. In experiments, we show that our approach learns succinct and well-adapted graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks

arXiv.org e-Print Archive

Transmission spectra and valley processing of graphene and carbon nanotube superlattices with inter-valley coupling

Author: Bin Wang
Fuming Xu
Hill A
McGuire L M
Ren Y
Wakabayashi K
Yadong Wei
Yafei Ren
Zhai F
Zhenhua Qiao
Zhizhou Yu
Zu F
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Measurement of neutron-induced fission cross sections of

Author: Han Yi
Hantao Jing
Jingyu Tang
Qiang Li
Rong Liu
Ruirui Fan
Wei Jiang
Yang Li
Yiwei Yang
Yonghao Chen
Zhixin Tan
Zhizhou Ren
Publication venue: EDP Sciences
Publication date: 26/05/2023
Field of study

235U and 238U are very important isotopes in the nuclear energy system. Their neutron-induced fission cross sections have been measured intensively and evaluated as standard up to 200 MeV. However, as a matter of fact, the experimental data in the high-energy region are scarce. This work reports the measurement of 235, 238U(n, f) cross sections relative to n-p scattering performed at the China Spallation Neutron Source (CSNS) back-streaming neutron facility (Back-n). Preliminary results of 235, 238U(n, f) cross sections from 10 to 66 MeV are obtained, which are generally following the shape of the IAEA standard. However, significant discrepancies are observed at some given energies, which will be further studied

EDP Sciences OAI-PMH repository (1.2.0)